pacman::p_load(tmap, sf, DT, stplanr,
performance,
ggpubr, tidyverse)Hands-on Exercise 3
15 Processing and Visualising Flow Data
15.1 Overview
Spatial interaction represent the flow of people, material, or information between locations in geographical space.
Each spatial interaction, as an analogy for a set of movements, is composed of a discrete origin-destination pair. Each pair can be represented as a cell in a matrix where rows are related to the locations (centroids) of origin, while columns are related to locations (centroids) of destination. Such a matrix is commonly known as an origin-destination matrix, or a spatial interaction matrix.
In this page, I show how I had completed the Hands-on Exercise 3 on building an Origin-Destination (OD) matrix by using the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall.
The objectives are:
Import and extract OD data for a selected time interval;
Import and save geospatial data (i.e. Bus Stop Location and Master Plan Subzone Boundary) into sf tibble data frame objects;
Populate planning subzone code into bus stops sf tibble data frame;
Construct desire lines geospatial data from the OD data; and
Visualise passenger volume by origin and destination bus stops by using the desire lines data.
15.2 Getting Started
The R packages used in this hands-on exercise are:
sf for importing, managing, and processing geospatial data;
tidyverse (i.e. readr, tidyr, dplyr) for performing data science tasks such as importing, tidying, and wrangling data;
tmap for thematic mapping;
DT a wrapper of the JavaScript Library ‘DataTables’ for creating interactive and dynamic data tables;
stplanr for working with spatial data related to transportation and urban planning;
performance for the assessment of regression models performance; and
ggpubr for creating customised and annotated ggplot2 plots for better visualisation.
15.3 Preparing the Flow Data
15.3.1 Importing the Aspatial Data
Firstly, we will import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall by using read_csv() of readr package.
odbus = read_csv("data/aspatial/origin_destination_bus_202310.csv")
glimpse(odbus)Rows: 5,694,297
Columns: 7
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
The values in “ORIGIN_PT_CODE” and “DESTINATON_PT_CODE” are in character data type. Hence, they are converted into factor data type.
odbus$ORIGIN_PT_CODE = as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE = as.factor(odbus$DESTINATION_PT_CODE) 15.3.2 Extracting the Study Data
The commuting flows on weekdays between 6am and 9am are extracted.
odbus6_9 = odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(odbus6_9)The output is saved in rds format. The rds file is then imported into the R environment.
write_rds(odbus6_9, "data/rds/odbus6_9.rds")
odbus6_9 = read_rds("data/rds/odbus6_9.rds")15.4 Importing the Geospatial Data
The two geospatial data sets used in this hands-on exercise are:
BusStop: This data provides the locations of bus stops as at July 2023.
MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019.
Both data sets are in the ESRI shapefile format.
They are imported and transformed using the st_read() and st_transform() functions in the sf package. The outputs are the busstop and mpsz sf data frames.
busstop = st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\jmphosis\ISSS624\Hands-on_Ex\Hands-on_Ex3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
mpsz = st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\jmphosis\ISSS624\Hands-on_Ex\Hands-on_Ex3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
mpszSimple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N
1 MARINA EAST MESZ01 MARINA EAST ME CENTRAL REGION
2 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
3 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
4 JURONG ISLAND AND BUKOM WISZ01 WESTERN ISLANDS WI WEST REGION
5 FORT CANNING MUSZ02 MUSEUM MU CENTRAL REGION
6 MARINA EAST (MP) MPSZ05 MARINE PARADE MP CENTRAL REGION
7 SUDONG WISZ03 WESTERN ISLANDS WI WEST REGION
8 SEMAKAU WISZ02 WESTERN ISLANDS WI WEST REGION
9 SOUTHERN GROUP SISZ02 SOUTHERN ISLANDS SI CENTRAL REGION
10 SENTOSA SISZ01 SOUTHERN ISLANDS SI CENTRAL REGION
REGION_C geometry
1 CR MULTIPOLYGON (((33222.98 29...
2 CR MULTIPOLYGON (((28481.45 30...
3 CR MULTIPOLYGON (((28087.34 30...
4 WR MULTIPOLYGON (((14557.7 304...
5 CR MULTIPOLYGON (((29542.53 31...
6 CR MULTIPOLYGON (((35279.55 30...
7 WR MULTIPOLYGON (((15772.59 21...
8 WR MULTIPOLYGON (((19843.41 21...
9 CR MULTIPOLYGON (((30870.53 22...
10 CR MULTIPOLYGON (((26879.04 26...
15.5 Data Wrangling - Combining Aspatial and Geospatial Data
The planning subzone codes (i.e., “SUBZONE_C”) of the mpsz sf data frame are populated into the busstop sf data frame using st_intersection() function in the sf package (for point and polygon overlay, with point sf object as output). The select() function in the dplyr package is then used to retain only the “BUS_STOP_N” and “SUBZON_C” columns in the combined sf data frame. Five bus stops are excluded in the combined sf data frame because they are outside the Singapore boundary.
Student Note:
The output of the
st_intersection()function contains the geometry spatial objects that intersect betweenbusstopandmpsz. Since one contains points, and the other contains polygons, the output will contain points (as the points are the common overlapping spatial objects).The
st_drop_geometry()function is used to remove the geometry column from the combined sf data frame. The output is a regular data frame with only information of attributes.
busstop_mpsz = st_intersection(busstop, mpsz) %>%
select(BUS_STOP_N, SUBZONE_C) %>%
st_drop_geometry()
datatable(busstop_mpsz)Again, the output is saved in rds format.
write_rds(busstop_mpsz, "data/rds/busstop_mpsz.rds")Next, the planning subzone codes from the busstop_mpsz are appended to the odbus6_9 data frame’s origin bus stops.
od_data = left_join(odbus6_9 , busstop_mpsz,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = SUBZONE_C,
DESTIN_BS = DESTINATION_PT_CODE)A check for duplicates is then conducted. Since 1,186 records are found to be duplicates, they are removed and a confirmatory check is conducted.
duplicate1 = od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate1# A tibble: 1,186 × 4
ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ
<chr> <fct> <dbl> <chr>
1 11009 01341 1 QTSZ01
2 11009 01341 1 QTSZ01
3 11009 01411 4 QTSZ01
4 11009 01411 4 QTSZ01
5 11009 01421 17 QTSZ01
6 11009 01421 17 QTSZ01
7 11009 01511 19 QTSZ01
8 11009 01511 19 QTSZ01
9 11009 01521 2 QTSZ01
10 11009 01521 2 QTSZ01
# ℹ 1,176 more rows
od_data = unique(od_data)
duplicate2 = od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate2# A tibble: 0 × 4
# ℹ 4 variables: ORIGIN_BS <chr>, DESTIN_BS <fct>, TRIPS <dbl>, ORIGIN_SZ <chr>
Finally, the planning subzone codes from the busstop_mpsz are appended to the odbus6_9 data frame’s destination bus stops.
od_data = left_join(od_data , busstop_mpsz,
by = c("DESTIN_BS" = "BUS_STOP_N")) Another check for duplicates is then conducted. Since 1,350 records are found to be duplicates, they are removed and a confirmatory check is conducted.
duplicate3 = od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate3# A tibble: 1,350 × 5
ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ SUBZONE_C
<chr> <chr> <dbl> <chr> <chr>
1 01013 51071 2 RCSZ10 CCSZ01
2 01013 51071 2 RCSZ10 CCSZ01
3 01112 51071 66 RCSZ10 CCSZ01
4 01112 51071 66 RCSZ10 CCSZ01
5 01112 53041 4 RCSZ10 BSSZ01
6 01112 53041 4 RCSZ10 BSSZ01
7 01121 51071 8 RCSZ04 CCSZ01
8 01121 51071 8 RCSZ04 CCSZ01
9 01121 82221 1 RCSZ04 GLSZ05
10 01121 82221 1 RCSZ04 GLSZ05
# ℹ 1,340 more rows
od_data = unique(od_data)
duplicate4 = od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate4# A tibble: 0 × 5
# ℹ 5 variables: ORIGIN_BS <chr>, DESTIN_BS <chr>, TRIPS <dbl>,
# ORIGIN_SZ <chr>, SUBZONE_C <chr>
The data frame is then tidied up, saved in rds file format, and imported into the R environment.
od_data = od_data %>%
rename(DESTIN_SZ = SUBZONE_C) %>%
drop_na() %>%
group_by(ORIGIN_SZ, DESTIN_SZ) %>%
summarise(MORNING_PEAK = sum(TRIPS))write_rds(od_data, "data/rds/od_data.rds")
od_data = read_rds("data/rds/od_data.rds")15.6 Visualising Spatial Interaction
15.6.1 Removing Intra-zonal Flows
The intra-zonal flows are removed since they will not be plotted.
od_data1 = od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]15.6.2 Creating Desire Lines
The od2line() function in the stplanr package is used to create the desire lines.
Student Note:
The
od2line()function is specifically used to convert OD flow data into lines, typically referred to as desire lines. Desire lines represent the flow of movement between different zones or locations.The “flow” argument represents the OD flow data. It could be a data frame or a matrix containing information about the number of trips (flow) between pairs of zones or locations.
The “zone” argument represents the spatial information of the zones. It is usually a spatial dataset (e.g., points, polygons) that defines the zones involved in the OD flow.
The “zone_code” argument specifies the column in the zone data set that contains the zone or location codes.
The output is a sf data frames, representing the desire lines - stored as linestrings under geometry.
flowLine = od2line(flow = od_data1,
zones = mpsz,
zone_code = "SUBZONE_C")15.6.3 Visualising Desire Lines
The desire lines are visualised using functions in the tmap package.
The “lwd” argument means that the values in the “MORNING_PEAK” column are used to determine the line width.
The “scale” argument serve as thresholds that define the ranges of quantiles (six in total, corresponding to “n” argument). The first and last values mean:
0.1: Lines with values up to 0.1 (10% quantile).
10: Lines with values between 7 and 10 (90-100% quantile).
tm_shape(mpsz) +
tm_polygons() +
flowLine %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)
When the flow data are very messy and highly skewed like the one shown above, it is wiser to focus on selected flows, for example flow greater than or equal to 5,000 as shown below.
tm_shape(mpsz) +
tm_polygons() +
flowLine %>%
filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)
Student Note: The desire lines above show a rather unexpected observation for the morning peak period for weekdays - the most movement happens between the east (Changi Airport) and the north (Woodlands Checkpoint) rather than heartlands to central Singapore (CBD). This indicates that there are commuters who enter Singapore through the Woodlands Checkpoint and head to Changi Airport for a flight out, or the reverse, commuters land in Changi Airport and head to the Woodlands Checkpoint to enter Malaysia.
~~~ End of Hands-on Exercise 3 ~~~